Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mgr/MgrStandby: respawn when deactivated #15557

Merged
merged 1 commit into from
Jun 8, 2017

Conversation

liewegas
Copy link
Member

@liewegas liewegas commented Jun 7, 2017

  • It is ugly to unwind all of the Mgr state so that we can reactivate
    later.
  • It is perhaps impossible to do shut down the python state reliably.
  • Respawning provides a clean state and is reliable.

This mostly just copies MDSServer::respawn().

Fixes: http://tracker.ceph.com/issues/19595
Fixes: http://tracker.ceph.com/issues/19549
Signed-off-by: Sage Weil sage@redhat.com

- It is ugly to unwind all of the Mgr state so that we can reactivate
  later.
- It is perhaps impossible to do shut down the python state reliably.
- Respawning provides a clean state and is reliable.

This mostly just copies MDSServer::respawn().

Fixes: http://tracker.ceph.com/issues/19595
Fixes: http://tracker.ceph.com/issues/19549
Signed-off-by: Sage Weil <sage@redhat.com>
@liewegas
Copy link
Member Author

liewegas commented Jun 7, 2017

I just hit another Mgr-shutdown bug in my last run:

2017-06-07T18:29:47.046 INFO:tasks.ceph.mgr.x.smithi116.stderr:src/tcmalloc.cc:278] Attempt to free invalid pointer 0x1f
2017-06-07T18:29:47.046 INFO:tasks.ceph.mgr.x.smithi116.stderr:*** Caught signal (Aborted) **
2017-06-07T18:29:47.046 INFO:tasks.ceph.mgr.x.smithi116.stderr: in thread 7f8da56cf700 thread_name:fn_anonymous
2017-06-07T18:29:47.047 INFO:tasks.ceph.mgr.x.smithi116.stderr: ceph version  12.0.2-2485-gc8340cd (c8340cde85674f8d9506d602368c2fd9a6307580) luminous (dev)
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 1: (()+0x393172) [0x56490d9f6172]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 2: (()+0x113e0) [0x7f8dacb8f3e0]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 3: (gsignal()+0x38) [0x7f8dabb20428]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 4: (abort()+0x16a) [0x7f8dabb2202a]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 5: (tcmalloc::Log(tcmalloc::LogMode, char const*, int, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem, tcmalloc::LogItem)+0x22e) [0x7f8dad7625ce]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 6: (()+0x1375f) [0x7f8dad75675f]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 7: (operator delete[](void*)+0x1fd) [0x7f8dad77966d]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 8: (std::_Rb_tree, std::allocator > >, std::pair, std::allocator > >, std::_Identity, std::allocator > > >, std::less, std::allocator > > >, std::allocator, std::allocator > > > >::erase(std::pair, std::allocator > > const&)+0x63) [0x56490d903723]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 9: (MetadataUpdate::finish(int)+0x43) [0x56490d905fb3]
2017-06-07T18:29:47.048 INFO:tasks.ceph.mgr.x.smithi116.stderr: 10: (Context::complete(int)+0x9) [0x56490d8cab79]
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: 11: (Finisher::finisher_thread_entry()+0x460) [0x56490da35480]
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: 12: (()+0x770a) [0x7f8dacb8570a]
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: 13: (clone()+0x6d) [0x7f8dabbf182d]
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr:2017-06-07 18:29:47.048186 7f8da56cf700 -1 *** Caught signal (Aborted) **
2017-06-07T18:29:47.049 INFO:tasks.ceph.mgr.x.smithi116.stderr: in thread 7f8da56cf700 thread_name:fn_anonymous

but fixing these feels like a waste of time.

@liewegas
Copy link
Member Author

liewegas commented Jun 8, 2017

@liewegas
Copy link
Member Author

liewegas commented Jun 8, 2017

tests look okay...

Copy link
Contributor

@jcsp jcsp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine, I was wondering if we could avoid the copy-paste by putting the respawn bit somewhere common but it's hardly essential.

@liewegas liewegas merged commit f05a34a into ceph:master Jun 8, 2017
@liewegas liewegas deleted the wip-mgr-respawn branch June 8, 2017 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants